The Effects and Interactions of Data Quality and Problem Complexity on Data Mining

نویسندگان

  • Roger H. Blake
  • Paul Mangiameli
چکیده

Data quality remains a persistent problem in practice and a challenge for research. In this study we focus on four of the most important dimensions of data quality accuracy, completeness, consistency, and timeliness. Definitions and conceptual models for these dimensions have not been collectively considered with respect to data mining in general and a key determinant of data mining outcomes, problem complexity, in particular. Conversely, these four dimensions of data quality have only been indirectly addressed by data mining research. Using definitions and constructs of data quality dimensions, our research shows for the first time that data quality and problem complexity have significant interaction effects on data mining outcomes. It also shows that the dimension of consistency can be concluded to have the largest effects of all four. Our results help address an important research challenge noted by March and Smith [15]. They also enable practitioners to assess data quality, prioritize efforts to improve it, and increase the performance of data mining for better decisions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method for Selecting the Supplier Based on Association Rule Mining

One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...

متن کامل

Patient Relationship Management Method, an Approach toward Patient Satisfaction: A Case Study in a Public Hospital

Introduction: The aim of this project is to investigate the concepts of patient relationship management (PRM), patient satisfaction and their relationship. As healthcare industry has its own complexity, the so well defined idea in marketing which is CRM could not be implemented directly in health industry. Consequently the concept of PRM has been introduced. Metho...

متن کامل

a swift heuristic algorithm base on data mining approach for the Periodic Vehicle Routing Problem: data mining approach

periodic vehicle routing problem focuses on establishing a plan of visits to clients over a given time horizon so as to satisfy some service level while optimizing the routes used in each time period. This paper presents a new effective heuristic algorithm based on data mining tools for periodic vehicle routing problem (PVRP). The related results of proposed algorithm are compared with the resu...

متن کامل

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

Opinion Mining, Social Networks, Higher Education

Background and Aim: With the advent of technology and the use of social networks such as Instagram, Facebook, blogs, forums, and many other platforms, interactions of learners with one another and their lecturers have become progressively relaxed. This has led to the accumulation of large quantities of data and information about studentschr('39') attitudes, learning experiences, opinions, and f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008